A Comparison of Word Frequency and N-Gram Based Vulnerability Categorization Using SOM

نویسندگان

  • Melanie Tupper
  • Nur Zincir-Heywood
چکیده

Network attackers exploit software vulnerabilities on network computers to facilitate successful attacks. Many organizations keep track of the existing software vulnerabilities in the form of vulnerability databases. However, categorizing vulnerabilities is difficult due to the large number of different attributes maintained. In this work we apply a dataclustering algorithm (SOM) to two different representations of information contained in an existing online vulnerability databases. After identifying the more valuable approach for this task, we are able to identify critical vulnerability features inherent in the dataset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparison of Text-Categorization Methods Applied to N-Gram Frequency Statistics

This paper gives an analysis of multi-class e-mail categorization performance, comparing a character n-gram document representation against a word-frequency based representation. Furthermore the impact of using available e-mail specific meta-information on classification performance is explored and the findings are presented.

متن کامل

A Comparison of Support Vector Machines and Self-Organizing Maps for e-Mail Categorization

This paper reports on experiments in multi-class document categorization with support vector machines and self-organizing maps. A data set consisting of personal e-mail messages is used for the experiments. Two distinct document representation formalisms are employed to characterize these messages, namely a standard word-based approach and a character n-gram document representation. Based on th...

متن کامل

Using Word Sequences for Text Summarization

Traditional approaches for extractive summarization score/classify sentences based on features such as position in the text, word frequency and cue phrases. These features tend to produce satisfactory summaries, but have the inconvenience of being domain dependent. In this paper, we propose to tackle this problem representing the sentences by word sequences (n-grams), a widely used representati...

متن کامل

Language-independent text categorization by word N-gram using an automatic acquisition of words

We previously proposed the accumulation method, a language-independent text classification method that is based on character N-grams. The accumulation method does not depend on the language structure because this method uses character N-grams to form

متن کامل

Improving Chinese Word Segmentation by Adopting Self-Organized Maps of Character N-gram

Character-based tagging method has achieved great success in Chinese Word Segmentation (CWS). This paper proposes a new approach to improve the CWS tagging accuracy by combining Self-Organizing Map (SOM) with structured support vector machine (SVM) for utilization of enormous unlabeled text corpus. First, character N-grams are clustered and mapped into a low-dimensional space by adopting SOM al...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008